-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
llvm: Fix alloca alignment and type selection in AllocOpt #60699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Inherit alignment from the original GC allocation with JL_SMALL_BYTE_ALIGNMENT as the minimum. Use alignment-sized integer chunks for the alloca type (matching emit_static_alloca) so SROA splits allocations into aligned pieces for better performance and vectorization. Also adds the missing setAlignment call in splitOnStack. Co-Authored-By: Claude Opus 4.5 <[email protected]>
| if (sz > 1) | ||
| align = MinAlign(JL_SMALL_BYTE_ALIGNMENT, NextPowerOf2(sz)); | ||
| // Inherit alignment from the original allocation, with GC alignment as minimum. | ||
| Align align(std::max((unsigned)orig_inst->getRetAlign().valueOrOne().value(), (unsigned)JL_SMALL_BYTE_ALIGNMENT)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels loosely unsound, since this is the minimum known alignment, and not the required alignment. The JL_SMALL_BYTE_ALIGNMENT value is the largest value that julia.gc_alloc_obj is permitted to return, so it is sometimes reasonable that we can use this as a hint, but we should be sure to clarify that this overalignment is merely a hint to the layout (although being more than 16 will penalize performance since it requires a more expensive stack adjustment on entry)
(unsigned)orig_inst->getRetAlign().valueOrOne().value()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if the allocation required a larger alignment wouldn't we inherit that, that's why we prefer to inherit and just baseline to gc align
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getRetAlign is not the required alignment, it is the minimum, so if it requires it, this would introduce a bug here
but that said, the function isn't capable of giving more than JL_SMALL_BYTE_ALIGNMENT (16) so having getRetAlign return more than 16 here would be a miscompile, so this always gives the correct answer anyways (and increasing from there is only a runtime performance penalty, not a correctness issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s the alignment we emitted no? If it required more it would already be a bug no? Unless you mean a gc alloc aligned that wouldn’t tell LLVM
test/llvmpasses/alloc-opt-gcframe.ll
Outdated
|
|
||
| ; CHECK-LABEL: @ccall_ptr | ||
| ; CHECK: alloca i64 | ||
| ; CHECK: alloca i128, align 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @maleadt
This might cause issues for the intel backend? If I recall correctly, they don't like i128?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure Our Int128 emits i128.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this i128 should be storage only most of the time, I just followed whatever we do for base Julia
Some backends don't support integer types larger than 64 bits, so cap the element size used in emit_static_alloca and AllocOpt at i64. For allocations larger than 8 bytes, use arrays of i64 instead of i128/i256. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Inherit alignment from the original GC allocation with JL_SMALL_BYTE_ALIGNMENT as the minimum. Use alignment-sized integer chunks for the alloca type (matching emit_static_alloca) so SROA splits allocations into aligned pieces for better performance and vectorization. Also adds the missing setAlignment call in splitOnStack. Co-Authored-By: Claude Opus 4.5 <[email protected]> --------- Co-authored-by: Claude Opus 4.5 <[email protected]> (cherry picked from commit 54fde7e)
Inherit alignment from the original GC allocation with JL_SMALL_BYTE_ALIGNMENT as the minimum. Use alignment-sized integer chunks for the alloca type (matching emit_static_alloca) so SROA splits allocations into aligned pieces for better performance and vectorization. Also adds the missing setAlignment call in splitOnStack. Co-Authored-By: Claude Opus 4.5 <[email protected]> --------- Co-authored-by: Claude Opus 4.5 <[email protected]> (cherry picked from commit 54fde7e)
Inherit alignment from the original GC allocation with JL_SMALL_BYTE_ALIGNMENT
as the minimum. Use alignment-sized integer chunks for the alloca type
(matching emit_static_alloca) so SROA splits allocations into aligned pieces
for better performance and vectorization.
Also adds the missing setAlignment call in splitOnStack.
Co-Authored-By: Claude Opus 4.5 [email protected]